SHAX: The Semantic Historical Archive eXplorer

نویسندگان

  • Michael Feldman
  • Shen Gao
  • Marc Novel
  • Katerina Papaioannou
  • Abraham Bernstein
چکیده

Newspaper archives are some of the richest historical document collections. Their study is, however, very tedious: one needs to physically visit the archives, search through reams of old, very fragile paper, and manually assemble cross-references. We present Shax, a visual newspaper-archive exploration tool that takes large, historical archives as an input and allows interested parties to browse the information included in a chronological or geographic manner so as to re-discover history. We used Shax on a selection of the Neue Zürcher Zeitung (NZZ)—the longest continuously published German newspaper in Switzerland with archives going back to 1780. Specifically, we took the highly noisy OCRed text segments, extracted pertinent entities, geolocation, as well as temporal information, linked them with the Linked Open Data cloud, and built a browser-based exploration platform. This platform enables users to interactively browse the 111906 newspaper pages published from 1910 to 1920 and containing historic events such as World War I (WWI) and the Russian Revolution. Note that Shax is neither limited to this newspaper nor to this time-period or language but exemplifies the power in combining semantic technologies with an exceptional dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Ontology-Based Archive for Historical Research

The digitalization of cultural materials is doubtless a key-enabler for increasing accessibility of cultural heritage documents, e.g., historical texts. In the last decade Semantic Digital Libraries (see, e.g., [1]) have attracted the attention of research communities coming from different research areas, such as Cultural Heritage, History, and Knowledge Engineering. In order to find more innov...

متن کامل

PRiSMHA (Providing Rich Semantic Metadata for Historical Archives)

In this paper we present the PRiSMHA project, whose main goal is to demonstrate that a rich semantic representation of the content of historical documents is useful since it can significantly improve the access to archival resources and sustainable thanks to a crowdsourcing approach. This goal poses interesting research challenges, both for the semantic model definition and the user interaction...

متن کامل

Interlinking current affairs with archives via the Semantic Web

The BBC has a very large archive of programmes, covering a wide range of topics. This archive holds a significant part of the BBC’s institutional memory and is an important part of the cultural history of the United Kingdom and the rest of the world. These programmes, or parts of them, can help provide valuable context and background for current news events. However the BBC’s archive catalogue ...

متن کامل

XML and Knowledge Technologies for Semantic-Based Indexing of Paper Documents

Effective daily processing of large amounts of paper documents in office environments requires the application of semantic-based indexing techniques during the transformation of paper documents to electronic format. For this purpose a combination of both XML and knowledge technologies can be used. XML distinguishes between data, its structure and semantics, allowing the exchange of data element...

متن کامل

Towards Semantic Enrichment of Newspapers: A Historical Ecology Use Case

Historical ecology research relies on historical accounts of human-animal interactions to study this interaction through space and time. Newspaper archives are a rich source of information, but require careful querying and filtering to collect the relevant information. Traditionally, this is a laborious manual task. In this position paper, we describe our ongoing work on semantically enriching ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014